Trading Consequences: A Case Study of Combining Text Mining and Visualization to Facilitate Document Exploration
نویسندگان
چکیده
Large-scale digitization efforts and the availability of computational methods, including text mining and information visualization, have enabled new approaches to historical research. However, we lack case studies of how these methods can be applied in practice and what their potential impact may be. Trading Consequences is an interdisciplinary research project between environmental historians, computational linguists and visualization specialists. It combines text mining and information visualization alongside traditional research methods in environmental history to explore commodity trade in the nineteenth century from a global perspective. Along with a unique data corpus, this project developed three visual interfaces to enable the exploration and analysis of four historical document collections, consisting of approximately 200,000 documents and 11 million pages related to commodity trading. In this paper we discuss the potential and limitations of our approach based on feedback from historians we elicited over the course of this project. Informing the design of such tools in the larger context of digital humanities projects, our findings show that visualization-based interfaces are a valuable starting point to large-scale explorations in historical research. Besides providing multiple visual perspectives on the document collection to highlight general patterns, it is important to provide a context in which these patterns occur and offer analytical tools for more in-depth investigations.
منابع مشابه
Trading Consequences: A Case Study of Combining Text Mining & Visualisation to Facilitate Document Exploration
Trading Consequences is an interdisciplinary research project between historians, computational linguists and visualization specialists. We use text mining and visualisations to explore the growth of the global commodity trade in the nineteenth century. Feedback from a group of environmental historians during a workshop provided essential information to adapt advanced text mining and visualisat...
متن کاملA Joint Semantic Vector Representation Model for Text Clustering and Classification
Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...
متن کاملMethodology for Validation of Issuance of Mystical and Ethical Narrations (A Case Study and Discourse Analysis on the Methodology of the Book Sirr ul-asra’)
The Book “the Secret of Prophet Mohammad’s Midnight Journey to the Seven Heavens in Explanation of Al-Mi’raj Hadith” is written by Ayatollah Sa’adatparvar. Analyzing the discourse of a part of its introduction, his recognition method about this hadith has been investigated in this paper. The paper aims at investigating the particular discourse pattern of the author in analyzing the document of ...
متن کاملDigitised historical text: Does it have to be mediOCRe?
This paper reports on experiments to improve the Optical Character Recognition (ocr) quality of historical text as a preliminary step in text mining. We analyse the quality of ocred text compared to a gold standard and show how it can be improved by performing two automatic correction steps. We also demonstrate the impact this can have on named entity recognition in a preliminary extrinsic eval...
متن کاملHCI Empowered Literature Mining for Cross-Domain Knowledge Discovery
This paper presents an exploration engine for text mining and crosscontext link discovery, implemented as a web application with a user-friendly interface. The system supports experts in advanced document exploration by facilitating document retrieval, analysis and visualization. It enables document retrieval from public databases like PubMed, as well as by querying the web, followed by documen...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- DSH
دوره 30 شماره
صفحات -
تاریخ انتشار 2015